Exploiting spatio‐temporal knowledge for video action recognition

نویسندگان

چکیده

Action recognition has been a popular area of computer vision research in recent years. The goal this task is to recognise human actions video frames. Most existing methods often depend on the visual features and their relationships inside videos. extracted only represent information current itself cannot general knowledge particular beyond video. Thus, there are some deviations these features, performance still requires improvement. In sudy, we present novel spatio-temporal module (STKM) endow with commonsense knowledge. To end, first collect hybrid external from universal fields, which contains both semantic information. Then graph convolution networks (GCN) used aggregate GCNs involve (i) spatial capture relations (ii) temporal serial occurrence among actions. By integrating can get better results. Experiments AVA, UCF101-24 JHMDB datasets show robustness generalisation ability STKM. results report new state-of-the-art 32.0 mAP AVA v2.1. On datasets, our method also improves by 1.5 AP 2.6 AP, respectively, over baseline method.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spatiotemporal Residual Networks for Video Action Recognition

Two-stream Convolutional Networks (ConvNets) have shown strong performance for human action recognition in videos. Recently, Residual Networks (ResNets) have arisen as a new technique to train extremely deep architectures. In this paper, we introduce spatiotemporal ResNets as a combination of these two approaches. Our novel architecture generalizes ResNets for the spatiotemporal domain by intro...

متن کامل

Spatiotemporal Networks for Video Emotion Recognition

Our article presents an audio-visual based multi-modal emotion classification system. Considering the fact of deep learning approaches to facial analysis have recently demonstrated high performance, in our work, we use convolutional neural networks (CNNs) for emotion recognition in video, relying on temporal averaging and pooling operations reminiscent of widely used approaches for the spatial ...

متن کامل

Exploiting semantic knowledge for robot object recognition

This paper presents a novel approach that exploits semantic knowledge to enhance the object recognition capability of autonomous robots. Semantic knowledge is a rich source of information, naturally gathered from humans (elicitation), which can encode both objects’ geometrical/appearance properties and contextual relations. This kind of information can be exploited in a variety of robotics skil...

متن کامل

Compressed Video Action Recognition

Training robust deep video representations has proven to be much more challenging than learning deep image representations and consequently hampered tasks like video action recognition. This is in part due to the enormous size of raw video streams, the associated amount of computation required, and the high temporal redundancy. The ‘true’ and interesting signal is often drowned in too much irre...

متن کامل

Action recognition in video

Automatic action recognition in video has a broad array of applications, from surveillance to interactive video games. Classic algorithms usually use handcrafted descriptors such as SIFT (see [5]) or HOG (see [3]) to compute feature vectors of videos, and have achieved promising results in the past (see [7]). More recently, Quoc Le and Will Zou at the Stanford AI lab have proved that ISA featur...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Iet Computer Vision

سال: 2022

ISSN: ['1751-9632', '1751-9640']

DOI: https://doi.org/10.1049/cvi2.12154